Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partial re-revert of #104336. Only JIT fixes are included. #105596

Merged
merged 5 commits into from
Aug 5, 2024

Conversation

VSadov
Copy link
Member

@VSadov VSadov commented Jul 27, 2024

Re: #102370 (comment)

The epilog is no different from other No-GC regions and its first instruction should be interruptible.
See: https://github.com/dotnet/runtime/pull/104336/files#r1664973906

The rest of the changes is dealing with GS cookie check that is not always tracking GC info correctly.

It is possible that this part was already problematic even before the change (i.e. it was missing multi-register returns on ARM64, tail calls might not have been tracking all live args correctly) - hard to tell if that was causing actual failures though. Even with the epilog interruptibility change, the failures were relatively rare GC+JIT stress scenarios.

@VSadov VSadov added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jul 27, 2024
@VSadov
Copy link
Member Author

VSadov commented Jul 27, 2024

/azp run runtime-coreclr gcstress-extra, runtime-coreclr gcstress0x3-gcstress0xc

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@VSadov
Copy link
Member Author

VSadov commented Jul 30, 2024

Figured the reasons for rare GC stress + JIT stress failures on arm64.

When we emit GS cookie check, we do not have correct GC info in codegen. At this point we often do not track return registers or call arguments (if it is a tail call).
GC info tracked by emit is likely correct, but when GS cookie check emits a call, that will pass the GC info down and mess things up.

Here is an example of a repro. The key parts here is that the method is fully interruptible (due to JIT stress?) and has a GS cookie check. The part that it also has a tailcall after the check makes things more likely to fail, but I do not believe a tailcall is strictly required to repro.

; Total bytes of code 92, prolog size 16, PerfScore 21.00, instruction count 23, allocated bytes for code 92 (MethodHash=ec4cfb07) for method System.Globalization.CompareInfo+SortHandleCache:.cctor() (MinOpts)
; ============================================================

; Assembly listing for method System.Collections.Generic.Dictionary`2[System.__Canon,long]:.ctor():this (FullOpts)
; Emitting BLENDED_CODE for generic ARM64 - Windows
; FullOpts code
; optimized code
; fp based frame
; fully interruptible
; No PGO data
; Final local variable assignments
;
;  V00 this         [V00,T00] (  3,  3   )     ref  ->   x0         this class-hnd single-def <System.Collections.Generic.Dictionary`2[System.__Canon,long]>
;  V01 tmp0         [V01,T01] (  1,  1   )     int  ->  [fp+0x14]  do-not-enreg[V] "GSCookie dummy"
;# V02 OutArgs      [V02    ] (  1,  1   )  struct ( 0) [sp+0x00]  do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
;  V03 GsCookie     [V03    ] (  1,  1   )    long  ->  [fp+0x18]  do-not-enreg[X] addr-exposed "GSSecurityCookie"
;
; Lcl frame size = 16

G_M46826_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
            stp     fp, lr, [sp, #-0x20]!
            mov     fp, sp
            movz    x1, #0x5678
            movk    x1, #0x1234 LSL #16
            movk    x1, #0xDEF0 LSL #32
            movk    x1, #0x9ABC LSL #48
            str     x1, [fp, #0x18]	// [V03 GsCookie]
						;; size=28 bbWeight=1 PerfScore 4.50
G_M46826_IG02:        ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref
                             ; gcrRegs +[x0]
            mov     w1, wzr
						;; size=4 bbWeight=1 PerfScore 0.50
G_M46826_IG03:        ; bbWeight=1, extend
            mov     x2, xzr
						;; size=4 bbWeight=1 PerfScore 0.50
G_M46826_IG04:        ; bbWeight=1, extend
            movz    x3, #0xC4E0      // code for System.Collections.Generic.Dictionary`2[System.__Canon,long]:.ctor(int,System.Collections.Generic.IEqualityComparer`1[System.__Canon]):this
						;; size=4 bbWeight=1 PerfScore 0.50
G_M46826_IG05:        ; bbWeight=1, extend
            movk    x3, #0x3371 LSL #16
						;; size=4 bbWeight=1 PerfScore 0.50
G_M46826_IG06:        ; bbWeight=1, extend
            movk    x3, #0x7FF9 LSL #32
						;; size=4 bbWeight=1 PerfScore 0.50
G_M46826_IG07:        ; bbWeight=1, extend
            ldr     x3, [x3]
						;; size=4 bbWeight=1 PerfScore 3.00
G_M46826_IG08:        ; bbWeight=1, extend
            movz    xip0, #0x5678
						;; size=4 bbWeight=1 PerfScore 0.50
G_M46826_IG09:        ; bbWeight=1, extend
            movk    xip0, #0x1234 LSL #16
						;; size=4 bbWeight=1 PerfScore 0.50
G_M46826_IG10:        ; bbWeight=1, extend
            movk    xip0, #0xDEF0 LSL #32
						;; size=4 bbWeight=1 PerfScore 0.50
G_M46826_IG11:        ; bbWeight=1, extend
            movk    xip0, #0x9ABC LSL #48
						;; size=4 bbWeight=1 PerfScore 0.50
G_M46826_IG12:        ; bbWeight=1, extend
            ldr     xip1, [fp, #0x18]	// [V03 GsCookie]
						;; size=4 bbWeight=1 PerfScore 2.00
G_M46826_IG13:        ; bbWeight=1, extend
            cmp     xip0, xip1
						;; size=4 bbWeight=1 PerfScore 0.50
G_M46826_IG14:        ; bbWeight=1, isz, extend
            beq     G_M46826_IG16
						;; size=4 bbWeight=1 PerfScore 1.00
G_M46826_IG15:        ; bbWeight=1, extend
            bl      CORINFO_HELP_FAIL_FAST
                             ; gcrRegs -[x0]                                  <==== x0 SHOULD STAY ALIVE HERE !!!
						;; size=4 bbWeight=1 PerfScore 1.00
G_M46826_IG16:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, epilog, nogc
            ldp     fp, lr, [sp], #0x20
            br      x3
						;; size=8 bbWeight=1 PerfScore 2.00

There are pieces of code that try to force the return register to be live, but seems to be missing cases when there are two returns.
There is also code that tries to infer live arguments, and appears working for x86, but in other cases may make live a few registers that are not really live. Not sure why that happens or whether liveness of GC arguments can be figured reliably.

Basically not having the live arg inference causes issues like above. Having it causes other issues (like crashes on x64 when reporting GCs that no longer contain GC refs).

Either inference should work correctly or this whole GS cookie deal should be not interruptible. That is just a few instructions before epilog anyways.

@VSadov
Copy link
Member Author

VSadov commented Jul 30, 2024

What was saving us, was the treatment of the first instruction of epilog as uninterruptible, which is wrong thing to do for regular methods that can do GC or can return. (unlike CORINFO_HELP_FAIL_FAST that does not do either of that).

As simple fix would be to do emitDisableGC(); right before calling CORINFO_HELP_FAIL_FAST. That would effectively make No-GC region related to the epilog to start one instruction earlier.
This fixes the problem, but it looks like emitDisableGC(); is not available for x86 codegen, thus I have a concern about this solution, even though so far we have not seen problems due to this on x86.

Alternatively, I could tell the emit to keep tracking its info when we call calling CORINFO_HELP_FAIL_FAST and ignore one that comes from codegen, but perhaps it is too much special casing.

@VSadov
Copy link
Member Author

VSadov commented Jul 31, 2024

I think emitDisableGC(); is the most portable and future-proof approach as it does not depend on the number of returns or even matching the platform ABI.

x86 could continue doing what is what doing.

@dotnet dotnet deleted a comment from azure-pipelines bot Aug 1, 2024
@dotnet dotnet deleted a comment from azure-pipelines bot Aug 1, 2024
@VSadov VSadov marked this pull request as ready for review August 1, 2024 07:00
@VSadov
Copy link
Member Author

VSadov commented Aug 3, 2024

/azp run runtime-coreclr gcstress-extra, runtime-coreclr gcstress0x3-gcstress0xc

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@dotnet dotnet deleted a comment from azure-pipelines bot Aug 3, 2024
@VSadov
Copy link
Member Author

VSadov commented Aug 3, 2024

I think this is ready for review. @dotnet/jit-contrib

@@ -594,8 +594,7 @@ bool emitter::emitGenNoGCLst(Callback& cb)
emitter::instrDesc* id = emitFirstInstrDesc(ig->igData);
assert(id != nullptr);
assert(id->idCodeSize() > 0);
if (!cb(ig->igFuncIdx, ig->igOffs, ig->igSize, id->idCodeSize(),
ig->igFlags & (IGF_FUNCLET_PROLOG | IGF_FUNCLET_EPILOG | IGF_EPILOG)))
if (!cb(ig->igFuncIdx, ig->igOffs, ig->igSize, id->idCodeSize(), ig->igFlags & (IGF_FUNCLET_PROLOG)))
Copy link
Member Author

@VSadov VSadov Aug 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the main part of the change. The epilog is no different from other No-GC regions and its first instruction should be interruptible.
See: https://github.com/dotnet/runtime/pull/104336/files#r1664973906

The rest of the changes is dealing with GS cookie check that is not always tracking GC info correctly and tweaking an assert in NativeAOT.

@jakobbotsch
Copy link
Member

When we emit GS cookie check, we do not have correct GC info in codegen. At this point we often do not track return registers or call arguments (if it is a tail call).

I think the GC information should be correct coming out of normal codegen (your change to allow GC on the epilog relies on that already). Did you understand what ends up breaking the GC information between finishing codegen for the BBJ_RETURN block and generating the GS cookie check in the corresponding epilog? It seems to me that we should fix that.

I think emitDisableGC(); is the most portable and future-proof approach as it does not depend on the number of returns or even matching the platform ABI.

The code can be written in a platform agnostic way using compiler->compRetTypeDesc.

@VSadov
Copy link
Member Author

VSadov commented Aug 5, 2024

When we emit GS cookie check, we do not have correct GC info in codegen. At this point we often do not track return registers or call arguments (if it is a tail call).

I think the GC information should be correct coming out of normal codegen (your change to allow GC on the epilog relies on that already). Did you understand what ends up breaking the GC information between finishing codegen for the BBJ_RETURN block and generating the GS cookie check in the corresponding epilog? It seems to me that we should fix that.

I did not find a good documentation on what BBJ_RETURN should do. In my understanding of the implementation, such block assigns return registers (or arguments of a tail call). There is no further consumer for the registers. What comes next is the exit/epilog code that simply assumes that everything is in right place.

The GC info tracked by emitter is correct at this point since it sees assignments of returns and arguments and normally that is sufficient.
The issue is that CodeGen does not. Why - I am not sure, but by the presence of very old workarounds in the code, I suspect that is expected.

Since emit tracks correctly everything works unless GS cookie is involved. Generating the check ends up flowing GC info from codegen to emit, since there is a method call and branch/label.

I think emitDisableGC(); is the most portable and future-proof approach as it does not depend on the number of returns or even matching the platform ABI.

The code can be written in a platform agnostic way using compiler->compRetTypeDesc.

There is code that tries to force codegen to treat return registers and args as live. It refers to a Dev10 bug and that bug hints that just making GS cookie check uninterruptible could have been simpler solution. (at least how I read the bug, it does not tell a lot of story)

What I see is that code also used on RISC platforms, but does not handle multiple return registers where more than one may be returned. It can be fixed using compRetTypeDesc, but should it be fixed?
And how early should we tell codegen that it suddenly has live return registers? Before epilog? Before GS cookie check? Earlier?
The whole deal of expanding GC live set suddenly feels a bit like a hack. Like forcing gcrRegs +[x0] on a random instruction.

The same deal applies to the arguments. In this case I am not sure it can be easily fixed/forced/patched. The same approach as in x86 that just simply forces everything that is classified as a register argument to come alive, seems to be forcing alive bunch of stuff that is not even containing GC references at that point. Maybe there is a bug in that code or maybe it should use the signature+ABI of the calee instead.
Again - perhaps it can be "fixed", but should it be?

We are talking about few instructions (GS check) that are not a part of normal method body (i.e. user-written code), but more like satisfying platform requirements about exiting method.

BTW. Another "fix" could be ensuring that GS check does not flow the codegen info (that is not corect) down to emit (that still tracs correct info). It is possible to use a different kind of label (local label?) that would not flow, but forcing the call not to flow feels like would be too invasive for how things are done currently (i.e. codegen knows and passes correct info at call sites, all calls kill certain registers, etc.. ).

In the end it feels like while more complex solutions may exist, treating GS check as start of the epilog (for the purpose of GC tracking) is simpler with very little downside.
(I think the part that x86 cannot do that is the only downside, really)

@VSadov
Copy link
Member Author

VSadov commented Aug 5, 2024

Another consideration is that even if we make codegen to track returns/tail-args as live coming from BBJ_RETURN, I would not know on which level that must be done (LSRA?) and whether we would want such solution right now.

Perhaps we should take a simpler solution now and follow up with more complex solutions, if we even decide we want that, later.

@jakobbotsch
Copy link
Member

I did not find a good documentation on what BBJ_RETURN should do. In my understanding of the implementation, such block assigns return registers (or arguments of a tail call). There is no further consumer for the registers. What comes next is the exit/epilog code that simply assumes that everything is in right place.

The GC info tracked by emitter is correct at this point since it sees assignments of returns and arguments and normally that is sufficient. The issue is that CodeGen does not. Why - I am not sure, but by the presence of very old workarounds in the code, I suspect that is expected.

I suspect the change codegen would need is to make this code run always:

if (compiler->compMethodReturnsRetBufAddr())
{
gcInfo.gcMarkRegPtrVal(REG_INTRET, TYP_BYREF);
}
else
{
for (unsigned i = 0; i < retRegCount; ++i)
{
if (varTypeIsGC(retTypeDesc.GetReturnRegType(i)))
{
gcInfo.gcMarkRegPtrVal(retTypeDesc.GetABIReturnReg(i, compiler->info.compCallConv),
retTypeDesc.GetReturnRegType(i));
}
}
}

This only runs for compIsProfilerHookNeeded right now (and we seem to clear the registers again after calling the profiler hook). Not sure why it tries so hard to keep the GC information incorrect.

I do not think tailcalls need any special handling. GT_JMP sets up GC info as part of genJmpPlaceArgs and tailcall GT_CALL should do it as part of GT_PUTARG_REG codegen.

What I see is that code also used on RISC platforms, but does not handle multiple return registers where more than one may be returned. It can be fixed using compRetTypeDesc, but should it be fixed?

IMO yes -- codegen and emit should generally agree on GC information at the boundaries. But I can be convinced to make this a .NET 10 item.

@VSadov
Copy link
Member Author

VSadov commented Aug 5, 2024

IMO yes -- codegen and emit should generally agree on GC information at the boundaries. But I can be convinced to make this a .NET 10 item.

My thought exactly. It does not look like the right place to be when codegen loses track of GC info even in this edge case. It breaks the model that "upper layers track at fewer edges, but more precisely".
If possible, that should be fixed and fixed correctly - not by forcing things back to life if we see GS cookie check.

Whether we want the "right" fix in 9.0 - not sure.
The emitDisableGC would seem a more reliable short term solution. Considering that the root cause might be fairly old, it is probably not hurting us in other scenarios.

@VSadov
Copy link
Member Author

VSadov commented Aug 5, 2024

Thanks!!

@VSadov VSadov merged commit d78082b into dotnet:main Aug 5, 2024
140 checks passed
@VSadov VSadov deleted the partReRevert branch August 5, 2024 18:34
@github-actions github-actions bot locked and limited conversation to collaborators Sep 7, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants